Instance Sampling for Identification of Arabic Pleonastic Pronouns M. Abdul-Mageed 1 Instance Sampling for Automatic Identification of Arabic Pleonastic Pronouns
نویسنده
چکیده
The term anaphora describes backward reference to items previously occurring in a text (see e.g., Mitkov, 2002). The pointing back item is called an anaphor and the item to which it refers is called its antecedent. The identification of an anaphor’s antecedent is termed anaphora resolution and is considered one of the most difficult tasks in natural language processing (NLP) since it relies on both linguistic and world knowledge. Resolving an anaphor to its antecedent is crucial in many NLP applications such as text summarization, machine translation, information extraction and question answering systems. I employ the term pleonastic pronouns to refer to cases of pronouns that do not introduce a new referent in discourse. In the literature, the terms non-referential, nonanaphoric, and expletive have been used to label these types of pronouns. Ability to identify non-referential pronouns before attempting an anaphora resolution task is significant since the system would not have to attempt resolving such pronouns and hence would end up with fewer errors. In addition, as (Boyd, Gegg-Harrison, and Byron, 2005) maintain in their study of non-referential it in English, this task of detecting non-referential pronouns could be incorporated into a part-of-speech tagger or parser, or treated as an initial step in semantic interpretation. The number of non-referential pronouns is sometimes non-trivial and hence it is significant to devote enough attention to its identification. In this paper I apply a memory-based learning (Daelemans and van den Bosch, 2005) method for identifying non-referential pronouns in an annotated sub-segment of the Penn Arabic Treebank, v3. I follow (Evans, 2001) in defining the task as a binary classification and investigate what instance sampling methods give the best results. Finally, I investigate what features are most important for the classification task. I acquire 89% of successful classification. Instance Sampling for Identification of Arabic Pleonastic Pronouns M. Abdul-Mageed 2
منابع مشابه
Statistical Identification of Pleonastic Pronouns
This paper describes an algorithm to identify pleonastic pronouns using statistical techniques. The training step uses a coreference annotated corpus of English and focuses on a set of pronouns such as it. As far as we know, there is no corpus with a pleonastic annotation. The main idea of the algorithm was then to recast the definition of pleonastic pronouns as pronouns that never occur in a c...
متن کاملIdentification of Pleonastic It Using the Web
In a significant minority of cases, certain pronouns, especially the pronoun it, can be used without referring to any specific entity. This phenomenon of pleonastic pronoun usage poses serious problems for systems aiming at even a shallow understanding of natural language texts. In this paper, a novel approach is proposed to identify such uses of it : the extrapositional cases are identified us...
متن کاملPronouns Without Explicit Antecedents: How do We Know When a Pronoun is Referential?
Pronouns without explicit noun phrase antecedents pose a problem for any theory of reference resolution. We report here on an empirical study of such pronouns in the Santa Barbara Corpus of Spoken American English, a corpus of spontaneous, casual conversation. Analysis of 2,046 third person personal pronouns in fourteen transcripts indicates that 330 (or 16.1%) lack NP antecedents. These pronou...
متن کاملA Modular Architecture for Anaphora Resolution
Anaphora resolution attempts to determine the correct antecedent of an anaphor (the term pointing back). In what follows, we propose an algorithm for the resolution of anaphoric pronouns that relies on lexical and syntactic knowledge incorporated in a modular approach based on constraints and preferences. Our objective was to find the correct antecedent to the following subject pronouns (il, il...
متن کاملTranslation of Power and Solidarity Pronouns in Qur’anic Rhetoric
Translation of the Holy Quran can be difficult for translators in terms of accuracy and translatability. Sometimes translators fail to render the Quranic thoughts because of the lack of language features in target languages. This results in an unfavorable interpretation. One of the challenging aspects of translating Quran is reference switching as rhetorical devices, which are widespread i...
متن کامل